Research on an Algorithm of Express Parcel Sorting Based on Deeper Learning and Multi-Information Recognition

With the development of smart logistics, current small distribution centers have begun to use intelligent equipment to indirectly read bar code information on courier sheets to carry out express sorting. However, limited by the cost, most of them choose relatively low-end sorting equipment in a warehouse environment that is complex. This single information identification method leads to a decline in the identification rate of sorting, affecting efficiency of the entire express sorting. Aimed at the above problems, an express recognition method based on deeper learning and multi-information fusion is proposed. The method is mainly aimed at bar code information and three segments of code information on the courier sheet, which is divided into two parts: target information detection and recognition. For the detection of target information, we used a method of deeper learning to detect the target, and to improve speed and precision we designed a target detection network based on the existing YOLOv4 network, Experiments show that the detection accuracy and speed of the redesigned target detection network were much improved. Next for recognition of two kinds of target information we first intercepted the image after positioning and used a ZBAR algorithm to decode the barcode image after interception. The we used Tesseract-OCR technology to identify the intercepted three segments code picture information, and finally output the information in the form of strings. This deeper learning-based multi-information identification method can help logistics centers to accurately obtain express sorting information from the database. The experimental results show that the time to detect a picture was 0.31 s, and the recognition accuracy was 98.5%, which has better robustness and accuracy than single barcode information positioning and recognition alone.


Introduction
With the rapid development of the world economy, people's living standards are improving day by day; At the same time, the rapid development of the internet enables more and more consumers to choose convenient online shopping. According to the statistics of the China Post Bureau, the business volume of express service enterprises in China has reached 108. 30 billion, up 29.9% year on year, and business revenue has reached 1033.23 billion yuan, up 17.5% year on year. In 2021, because of the increase of express business, this has put existing logistics systems to a huge test. At present, the sorting of express delivery is mainly by a courier sheet that can be divided into automatic sorting, semi-automatic sorting and manual sorting. Automatic sorting is by use of infrared bar code detection based on radio frequency identification (rfid) technology for delivery information [1]. This method is costly, difficult to popularize, and mainly used in large-scale logistics express sorting centers [2]. Semi-automatic sorting involves a semi-automatic sorting machine based on machine vision for sorting. The staff put the courier sheet upward and then put it for positioning of the two kinds of information. In enable accurate detection in complex environments, we redesigned the key positioning network, which was optimized based on YOLOv4, and ensured the speed and accuracy of the optimized network. The network includes the backbone feature extraction network of YOLOv4, the spatial pooling layer after adding the cross-stage module, the attention module SE, and the use of FPN structure. As a backbone feature extraction network, CSPDarKet53 can ensure accuracy and greatly reduce the number of parameters. A spatial pooling layer with inter-phase modules was used instead of the original spatial pooling layer, which helped to ensure the accuracy and reduce the number of parameters. The attention module SE was added to enhance features that improved the accuracy of detection. Replacing the original structure with an FPN structure effectively reduces the network complexity and parameter number, and the optimized positioning network is much better than the YOLOv4 network. The next step is the recognition of target information. First, the rectangular boxes containing barcode information and three-segment code information are captured, and the pictures containing barcode information are decoded and output by the ZBAR algorithm. The characters of the three-segment code information box are recognized by Tesseract-OCR text recognition. The string information of the two is written into the text so that the express sorting information can be accurately obtained from the database. Our main contributions are as follows.

•
We propose a multi-information fusion courier sheet recognition method instead of single information target recognition in the process of express sorting to improve the recognition rate of sorting.

•
The YOLOv4 target detection model was optimized for target information positioning. Compared with other detection networks, the performance of courier sheet detection is more powerful.
positioning of the one-dimensional barcode and the three-segment code on the courier sheet, and the other the decoding of the barcode and the recognition of the three-segment code. As shown in Figure 1, the images are first input into the target detection network for positioning of the two kinds of information. In enable accurate detection in complex environments, we redesigned the key positioning network, which was optimized based on YOLOv4, and ensured the speed and accuracy of the optimized network. The network includes the backbone feature extraction network of YOLOv4, the spatial pooling layer after adding the cross-stage module, the attention module SE, and the use of FPN structure. As a backbone feature extraction network, CSPDarKet53 can ensure accuracy and greatly reduce the number of parameters. A spatial pooling layer with inter-phase modules was used instead of the original spatial pooling layer, which helped to ensure the accuracy and reduce the number of parameters. The attention module SE was added to enhance features that improved the accuracy of detection. Replacing the original structure with an FPN structure effectively reduces the network complexity and parameter number, and the optimized positioning network is much better than the YOLOv4 network. The next step is the recognition of target information. First, the rectangular boxes containing barcode information and three-segment code information are captured, and the pictures containing barcode information are decoded and output by the ZBAR algorithm. The characters of the three-segment code information box are recognized by Tesseract-OCR text recognition. The string information of the two is written into the text so that the express sorting information can be accurately obtained from the database. Our main contributions are as follows.

•
We propose a multi-information fusion courier sheet recognition method instead of single information target recognition in the process of express sorting to improve the recognition rate of sorting.

•
The YOLOv4 target detection model was optimized for target information positioning. Compared with other detection networks, the performance of courier sheet detection is more powerful. The rest of this article is as follows. Section 2 describes the target detection network used by the method. Section 3 mainly describes the recognition method of target information. In Section 4, experiments are described to verify the reliability of the detection model and the recognition algorithm, and the whole method is evaluated to verify the feasibility of the proposed method. Section 5 provides the conclusion. The rest of this article is as follows. Section 2 describes the target detection network used by the method. Section 3 mainly describes the recognition method of target information. In Section 4, experiments are described to verify the reliability of the detection model and the recognition algorithm, and the whole method is evaluated to verify the feasibility of the proposed method. Section 5 provides the conclusion.

Target Detection Network
Widely used target detection algorithms in current target detection tasks are all based on deep convolutional neural networks [14] that can learn features from a large amount of data. At present, detection is mainly divided into two-stage with detectors such as R-CNN, Fast R-CNN, Faster R-CNN [15][16][17] and single-stage detectors such as YOLO [18] series and SSD [19]. The output of the single-stage detector only needs a CNN operation to obtain the result directly. The two-stage detector needs to be divided into two steps. The first step is to perform a simple CNN operation, and the second step is to score the results obtained in the first step. Then, the candidate regions with high scores are input into CNN for final prediction. Because of the existence of candidate regions, the two-stage detector has high accuracy but is not as fast as the single-stage detector. Therefore, for fast real-time target detection, a single-stage detector is preferred. Whether target detectors are efficient free (e.g., CenterNet [20]) or anchor based (e.g., EfficientDet and YOLOv4. [21,22]) divides them into two types based on anchor points. The biggest advantage of the former is that the speed of the detector is very fast, and there is no need for preset anchor and direct regression, which greatly reduces time consumption and computational power. The latter has higher accuracy and can extract richer features, but it takes more time and computational power. Therefore, our research considers this selection.
YOLOv4 was improved on the basis of YOLOv3 [23]. As an efficient and powerful target detection model, it takes into account both speed and accuracy. It is mainly composed of three parts: a feature extraction network, Backbone; a Neck for feature fusion, and a detection Head, Yolo Head, for classification and regression operation.
As shown in Figure 2, a picture of the courier sheet captured is input into the YOLOv4 network. The network first adjusts the picture to the size of [3,416,416], and then trunk feature extraction network CSPDarknet53 extracts target features. A shallow feature map, deep feature number, and a deep feature map are introduced into the Neck part. After using the SPP structure to enhance the receptive field on the deep feature map, the three feature maps are put into the path aggregation network PANet [24] to extract features repeatedly, and finally into the Yolo Head. The image can be divided into [52, 52, N], and [13,13, N] feature maps of different sizes for detection of large targets, medium targets and small targets, where N = 3 × (5 + C), which depends on the model category.
The loss function of YOLOv4 can be divided into three parts: confidence loss L con f , classification loss l class , and regression frame loss l CoU . L con f and l class . These are expressed in Equations (1) and (2).
Regression box loss represents the error between the prediction box and the real box. To ensure more accurate calculation results, several aspects are considered, including the overlapping area of the detection frame, the distance of the center point, and the length-width ratio. Regression box loss l CoU formula is shown in Equation (5).
Note that α and v are penalty terms for the aspect ratio, ω gt and h gt are the width and height of the real box, w and h are the width and height of the predicted box, d is the Euclidean distance between the two center points, and c is the diagonal distance of the closure. The loss function of YOLOv4 is expressed in Equation (6). loss(object) = l CoU − l con f − l class (6) fusion, and a detection Head, Yolo Head, for classification and regression operation. As shown in Figure 2, a picture of the courier sheet captured is input into the YOLOv4 network. The network first adjusts the picture to the size of [3,416,416], and then trunk feature extraction network CSPDarknet53 extracts target features. A shallow feature map, deep feature number, and a deep feature map are introduced into the Neck part. After using the SPP structure to enhance the receptive field on the deep feature map, the three feature maps are put into the path aggregation network PANet [24] to extract features repeatedly, and finally into the Yolo Head. The image can be divided into [52, 52, N], [26, 26, N] and [13,13, N] feature maps of different sizes for detection of large targets, medium targets and small targets, where N = 3 × (5 + C), which depends on the model category. The loss function of YOLOv4 can be divided into three parts: confidence loss , classification loss , and regression frame loss . and . These are expressed in Equations (1) and (2).
Regression box loss represents the error between the prediction box and the real box. To ensure more accurate calculation results, several aspects are considered, including the overlapping area of the detection frame, the distance of the center point, and the lengthwidth ratio. Regression box loss formula is shown in Equation (5).

SPP Module of Csp Modularization
In deeper learning, the high-level network layer has a large receptive field, so it has a strong ability to represent semantic information. However, the feature map has low resolution and poor ability to represent spatial information. The receptive field ratio of low layer network layer is small, in contrast to that of high layer network layer. Therefore, spatial pyramid pooling SPP [25] was proposed to deal with these problems [26]. This structure is mainly about the maximum pooling of 5 × 5, 9 × 9 and 13 × 13 with different sizes after convolution, batch normalization and activation function. The maximum pooling of the characteristic graph is joined together to change the channel to 2048 with the original size unchanged. Such operation by integrating different receptive fields can enrich the semantic information of feature maps and effectively improve model performance [27]. At the same time, we know that CSPDarknet53, the backbone feature extraction network of YOLOv4, is the key factor in obtaining good results with this network. The cross-stage part network (CSPNet [28]) is a structure proposed from the perspective of network architecture, as shown in Figure 2, CSP_X. This structure divides the input part into two parts, and the backbone part continues the residual The stacking of the other part is directly connected to the end to achieve channel splicing with the backbone part, which is equivalent to a large residual edge. Splitting first and then overlapping greatly reduces the number of parameters and computation, and meanwhile strengthens the CNN's learning ability and eliminates a computing bottleneck [29]. A K layer CNN with B basic layer channels is shown in Table 1 below. In addition to the CSP structure of the trunk network, we considered combining the SPP structure mentioned above with the CSP module and optimizing it in the network. This KIND of CSP modular SPP structure reduces the amount of calculation resulting from increasing the SPP module and improves accuracy, achieving the purpose of reducing parameters but ensuring accuracy [28]. The improved CSP-SPP module is shown in Figure 3.
perspective of network architecture, as shown in Figure 2, CSP_X. This structure di the input part into two parts, and the backbone part continues the residual The sta of the other part is directly connected to the end to achieve channel splicing wit backbone part, which is equivalent to a large residual edge. Splitting first and overlapping greatly reduces the number of parameters and computation, and mean strengthens the CNN's learning ability and eliminates a computing bottleneck [29] layer CNN with B basic layer channels is shown in Table 1 below.

Model
Original To CSP Dark layer 5whkb2 whb2(3/4 + 5k/2) In addition to the CSP structure of the trunk network, we considered combinin SPP structure mentioned above with the CSP module and optimizing it in the net This KIND of CSP modular SPP structure reduces the amount of calculation resu from increasing the SPP module and improves accuracy, achieving the purpo reducing parameters but ensuring accuracy [28]. The improved CSP-SPP module is s in Figure 3.

Attention Module SE
The attention model was originally used in machine translation and has becom important part of neural networks. The attentional mechanism module can pic helpful features by attaching weights to different concerns within the network. A many attention modules, the SE module is the classic. This focuses on the relation

Attention Module SE
The attention model was originally used in machine translation and has become an important part of neural networks. The attentional mechanism module can pick out helpful features by attaching weights to different concerns within the network. Among many attention modules, the SE module is the classic. This focuses on the relationships between channels so that the model learns only useful channel characteristics. It first reduces the dimension of spatial features to 1 × 1 by global average pooling based on the width and height of feature graphs, as shown in Equation (7). Then, two fully connected layers and nonlinear activation functions are used to establish connections between channels, as shown in Equation (8). The normalized weight is obtained by a Sigmoid activation function, and weighted to each channel of the original feature map by multiplication to complete the re-calibration of the original feature by channel attention, as shown in Equation (9) below.
After global average pooling, the global receptive field can be obtained. During the first full connection, the parameters and calculation amount are greatly reduced by reducing the dimension of the feature graph. Following the nonlinear activation function, the correlation between channels is completed by restoring the original channel number through a full connection. See width and height of feature graphs, as shown in Equation (7). Then, two fully connected layers and nonlinear activation functions are used to establish connections between channels, as shown in Equation (8).
= ( ( ( ))) The normalized weight is obtained by a Sigmoid activation function, and weighted to each channel of the original feature map by multiplication to complete the re-calibration of the original feature by channel attention, as shown in Equation (9) below.
After global average pooling, the global receptive field can be obtained. During the first full connection, the parameters and calculation amount are greatly reduced by reducing the dimension of the feature graph. Following the nonlinear activation function, the correlation between channels is completed by restoring the original channel number through a full connection. See Figure 4.

Use of the Feature Pyramid Structure
We used a feature pyramid structure, FPN, to replace the PANet path aggregation structure. PANet is an improved version of FPN, which adds a top-down path after a topdown path to achieve feature fusion. Such a structure can be more beneficial to classification and positioning, but at the same time greatly increases the cost of computing. The object features to be detected in our study are not complex, and the difference between the two structures is not obvious. However, it was hoped that the computation and complexity of the network would be reduced, so the FPN structure was used for feature fusion.

Improved YOLOv4 Algorithm
The structure of the detection network is shown in Figure 5. We continued to use the backbone feature extraction network of YOLOv4, and added the SE module after three output layers and after up-sampling to improve positioning accuracy. After that, the backbone part was used with the above-mentioned Csp-spp module to reduce more parameters while improving the receptive field, and finally we used the FPN structure to fuse the features and then output the targe.

Use of the Feature Pyramid Structure
We used a feature pyramid structure, FPN, to replace the PANet path aggregation structure. PANet is an improved version of FPN, which adds a top-down path after a topdown path to achieve feature fusion. Such a structure can be more beneficial to classification and positioning, but at the same time greatly increases the cost of computing. The object features to be detected in our study are not complex, and the difference between the two structures is not obvious. However, it was hoped that the computation and complexity of the network would be reduced, so the FPN structure was used for feature fusion.

Improved YOLOv4 Algorithm
The structure of the detection network is shown in Figure 5. We continued to use the backbone feature extraction network of YOLOv4, and added the SE module after three output layers and after up-sampling to improve positioning accuracy. After that, the backbone part was used with the above-mentioned Csp-spp module to reduce more parameters while improving the receptive field, and finally we used the FPN structure to fuse the features and then output the targe.

Barcode Decoding
To decode a barcode on the courier sheet, we chose the Zbar algorithm for the decoding operation. The Zbar algorithm is an open-source barcode detection algorithm online. The algorithm can not only read a variety of sources of barcode, such as image files, and videos, but also supports a variety of barcode types, including EAN-13 / UPC-A, UPC-E, EAN-8, Code128, Code38, and QR. Our form of bar code was mainly code128. This is shown in Figure 6. Code128 consists of a series of parallel bars and blanks divided from left to right into left margin, start bit, data, validator, end bit, and right margin.

Barcode Decoding
To decode a barcode on the courier sheet, we chose the Zbar algorithm for the decoding operation. The Zbar algorithm is an open-source barcode detection algorithm online. The algorithm can not only read a variety of sources of barcode, such as image files, and videos, but also supports a variety of barcode types, including EAN-13/UPC-A, UPC-E, EAN-8, Code128, Code38, and QR. Our form of bar code was mainly code128. This is shown in Figure 6. Code128 consists of a series of parallel bars and blanks divided from left to right into left margin, start bit, data, validator, end bit, and right margin. (1) Band Code. Four values of 1, 2, 3 and 4 are assigned according to the thickness, thickness and width of the bar, and the blank respectively. The Band Code of the barcode can be obtained successively.
(2) Left and right-side blank area. A blank space should be left on both sides of the bar code and the width should be 10 times the unit width (note: the unit width is the stripe width of width (1), allowing the bar code reader to enter the readability stage.
(3) Starting bit. The bar and blank detected in the first area of the barcode, which is the beginning of the visible part of the barcode, is composed of six interwoven bars and blanks of different thickness, with a total of 11-unit widths. In Code128, the starting bits of code A, B, and C are 211412, 211214, and 211232 respectively. The type of Code128 is determined by the start bit. (1) Band Code. Four values of 1, 2, 3 and 4 are assigned according to the thickness, thickness and width of the bar, and the blank respectively. The Band Code of the barcode can be obtained successively.
(2) Left and right-side blank area. A blank space should be left on both sides of the bar code and the width should be 10 times the unit width (note: the unit width is the stripe width of width (1), allowing the bar code reader to enter the readability stage.
(3) Starting bit. The bar and blank detected in the first area of the barcode, which is the beginning of the visible part of the barcode, is composed of six interwoven bars and blanks of different thickness, with a total of 11-unit widths. In Code128, the starting bits of code A, B, and C are 211412, 211214, and 211232 respectively. The type of Code128 is determined by the start bit.
(4) Data. The data area expresses the coding information of the barcode, which is composed of multiple characters. Each character also consists of six bars and blanks.
(5) Validator. This is used to verify the validity of the barcode. The method of checksum module 103 was adopted, and the calculation method [31] is shown in Equation (10).
N is the value of the bit data. (6) End character. This indicates the end-state of the barcode, which is fixed, and the corresponding Band Code is 2331112; After the image is put into the detection network to detect the area of the bar code, the rectangular box containing the bar code must be captured. After the captured image is put into the ZBar algorithm, the algorithm analyzes and scans the image, and determines the Band Code of the bar code by the width of the bar and the empty, to extract the character information contained in the bar code.
As shown in Figure 6, the string "ST089030003" was identified by this algorithm, so that the sorting information of the express could be retrieved from the database.

Recognition of Three Segments of Code
Three-segment code characters are mainly printed bodies combining digits, hyphens and English letters. After obtaining pictures containing three-segment code characters, OCR (Optical Character Recognition) Character Recognition is required. Only when the string information is obtained can the sorting information corresponding to the character be obtained through the database. From the collected data, it was seen that the character distortion of the package would inevitably occur during the transportation process, and the recognition environment could be complex, which would affect the accuracy of recognition. Therefore, Tesseract was used for recognition. Tesseract is an open-source OCR engine. The fourth-generation version can support deep learning OCR, can recognize multiple formats of image files, and convert these to text. Figure 7 shows the single three-segment code style on the express side. OCR engine. The fourth-generation version can support deep learning OCR, can recognize multiple formats of image files, and convert these to text. Figure 7 shows the single three-segment code style on the express side. After obtaining the rectangular box containing three sections of code information, we used OpenCV and Tesseract together to obtain text recognition. As shown in Figure 8, after the image is first input, we use OpenCV's EAST text detector to detect the text in the image. The EAST text detector provides the bounding box coordinates of the text ROI. We extract each text ROI and input these into the LSTM deep learning text recognition algorithm of Tesseract V4. Finally, the output of the LSTM provides the actual OCR result, which is a string. After obtaining the string, we find the sorting information represented by the corresponding number through the database. After obtaining the rectangular box containing three sections of code information, we used OpenCV and Tesseract together to obtain text recognition. As shown in Figure 8, after the image is first input, we use OpenCV's EAST text detector to detect the text in the image. The EAST text detector provides the bounding box coordinates of the text ROI. We extract each text ROI and input these into the LSTM deep learning text recognition algorithm of Tesseract V4. Finally, the output of the LSTM provides the actual OCR result, which is a string. After obtaining the string, we find the sorting information represented by the corresponding number through the database. After obtaining the rectangular box containing three sections of code information, we used OpenCV and Tesseract together to obtain text recognition. As shown in Figure 8, after the image is first input, we use OpenCV's EAST text detector to detect the text in the image. The EAST text detector provides the bounding box coordinates of the text ROI. We extract each text ROI and input these into the LSTM deep learning text recognition algorithm of Tesseract V4. Finally, the output of the LSTM provides the actual OCR result, which is a string. After obtaining the string, we find the sorting information represented by the corresponding number through the database.

Dataset
A dataset was created to simulate the environment of logistics and contained a total of 1680 images, mainly captured by cameras. The pictures of express delivery sheets included multiple express companies and different materials and different sizes, and were sampled under different lighting conditions and different angles. After obtaining the dataset, we used the open-source labeling tool labelimg to label in the VOC dataset format. Before image training, we also augmented the data set, and improved the generalization performance of the model by adjusting the image rotation angle, hue, saturation and other operations. Finally, the data set was divided into a training set and a test set with a ratio of 9:1. Each sample corresponded to two files, namely (1) a JPG file with the image of the

Dataset
A dataset was created to simulate the environment of logistics and contained a total of 1680 images, mainly captured by cameras. The pictures of express delivery sheets included multiple express companies and different materials and different sizes, and were sampled under different lighting conditions and different angles. After obtaining the dataset, we used the open-source labeling tool labelimg to label in the VOC dataset format. Before image training, we also augmented the data set, and improved the generalization performance of the model by adjusting the image rotation angle, hue, saturation and other operations. Finally, the data set was divided into a training set and a test set with a ratio of 9:1. Each sample corresponded to two files, namely (1) a JPG file with the image of the package containing the express receipt, and (2) an xml file that stores image information, labels and coordinates corresponding to the region of interest in the image.

Experimental Environment and Training Process
This experiment used the operating system win 10 64 and the neural network framework pytorch. The hardware configuration included a CPU with Intel(R) Core (TM) i9-10900K CPU @ 3.70 GHz 3.70 GHz; RAM is 64 GB; GPU is NVIDIA GeForce RTX 2080 Ti.
In the object detection network experiment, the size of the input image was 416 × 416, the batch size was 16, the maximum number of iterations was 100, the initial learning rate was 0.001, and the attenuation coefficient was 0.0005. The ratio of training set to test set was 9:1.
At the same time, using the pre-trained model in the detection network, an accurate model could be obtained in a short time by transfer learning.

Evaluation Index of Experimental Results
The target detection model was applied to the distribution center for real-time detection, so the detection speed and accuracy were more important evaluation criteria. The experiment used frame rate per second (FPS) as the speed evaluation index. The FPS value reflects the number of pictures that can be processed per second. The higher the FPS, the faster the detection speed. After that, the FPS data was obtained in the above configuration. Finally, it was decided to use the average precision (AP), precision rate (P), recall rate (R), F1-measure (F 1) value, model size and FPS in the detection network to evaluate the network performance. The calculation formulas of P, AP and F1 are expressed as Equations (10)- (12): Among them, TP represents positive samples predicted to be positive, FP represents negative samples predicted to be positive, and FN represents positive samples predicted to be negative.

Improved YOLOv4 Model Evaluation
The results of the improved YOLOv4 model are shown in Figure 9. It can be seen from various indicators that the experimental results of this positioning network model were good.

Ablation Experiments
In this section, the SPP structure combined with CSP structure is denoted as CS-YOLOv4, and the SE module is denoted as SCS-YOLOv4. Through experimental testing, we found that the performance of our model was improved in various aspects.
It can be seen from Tables 2 and 3 that the optimized YOLOv4 network SCS-YOLOv4 has different degrees of improvement in AP, P, FPS and size compared with the YOLOv4 network. In particular, in the detection of three-segment codes, the AP value increased by 1.7 percentage points, and the p value increased by 3.5 percentage points.

Ablation Experiments
In this section, the SPP structure combined with CSP structure is denoted as CS-YOLOv4, and the SE module is denoted as SCS-YOLOv4. Through experimental testing, we found that the performance of our model was improved in various aspects.
It can be seen from Tables 2 and 3 that the optimized YOLOv4 network SCS-YOLOv4 has different degrees of improvement in AP, P, FPS and size compared with the YOLOv4 network. In particular, in the detection of three-segment codes, the AP value increased by 1.7 percentage points, and the p value increased by 3.5 percentage points. Comparative Experiments of Different Models In our study, the common positioning model and SCS-YOLOv4 model were selected to compare their performance. Tables 4 and 5 show model comparisons with respect to five aspects of AP, F1, P, FPS and size. All the results were obtained from the same data set.  The two tables above clearly show the differences between the models. The two-stage target detection network Faster R-CNN model has a significant advantage in accuracy, but the model detection speed is too slow and the model is too large. The SSD300 test model has faster speed and more suitable size, but the accuracy is slightly different from other models. YOLOv4 network takes into account both speed and accuracy, and performs well as a whole. Considering that speed and accuracy are important indicators for sorting of express deliveries in a logistics center, and our network is more powerful than the YOLOv4 network and has been improved in various aspects, we used our network to achieve the positioning of target information.

Experimental Results Analysis
After the SCS-YOLOV4 algorithm was used to complete positioning, information was identified. During recognition, we found that it was difficult to accurately and quickly identify a picture with a large deflection angle, so it was necessary to correct this. We intercepted the extent of the bounding box, then the edge detection algorithm in OpenCV is used to process the captured image, and the minAreaRect () method was used to obtain the deflection angle of the image. Finally, affine transformation was used to correct the deflection image. After the corrected picture was obtained, we used the ZBar algorithm to identify the bar code, and the Tesseract to identify the three-segment code.

Evaluation of Experimental Results
In the logistics environment, if you want to carry out real-time recognition, recognition accuracy P and recognition speed S are important indicators. P is defined in Equation (14).
where N is the number of samples, and N 1 is the number of correctly identified samples.

Barcode Decoding Test
We selected 200 bar code pictures as samples for the bar code recognition experiment. The experimental results are shown in Table 6. Three-Segment Code Identification Test We selected 200 images of three sections of code as samples for the barcode recognition experiment. We found that the image had a lot of interference information which affected recognition accuracy. Therefore, after rotation correction, three code regions were positioned to reduce the interference information and increase the recognition accuracy. The experimental results are shown in Table 7.

Multi-Information Target Recognition Test
We analyzed the recognition results of 200 pictures. From the perspective of the overall method, the success rate of express sorting recognition was 98.5%, whether it was single information recognition or multiple information recognition. The results are shown in Table 8.

Time Performance
In our research, the above three algorithm modules were tested separately using 16GB RAM on a 64-bit Windows operating system, with an Intel(R) Core (TM) I7-10875H CPU @ 2.30 GHz, and a main frequency of 2.30 GHz. The running time is shown in Table 7 in seconds (s), and the average running time was 0.31 s when processing a 416 × 416 size package image, as shown in Table 9.

Comparison of Different Express Sorting Methods
We compared several express sorting methods as shown in Table 10. It can be seen from Table 9 that the method of Liu W et al. is better in time performance, and can reach 0.11 s, but the recognition accuracy is low. The time performance of our method is not outstanding, only 0.31 s, but our method can attain 98.5% accuracy, which is more robust than other methods.

Conclusions
Aiming at the problem of the low recognition rate of single information sorting methods in small semi-automatic sorting centers, our research proposes a fast recognition method for courier sheet analysis based on deeper learning using multi-information fusion of a one-dimensional barcode and three-segment code. The experimental results show that the method can obtain the information on the courier sheet accurately. At the same time, considering that the overall recognition time is slower than that of single information, we can take the barcode information as the main information and the three-segment code information as the auxiliary information. Only when the barcode cannot be identified is the three-segment code information identified to reduce the sorting time. In general, although this method is slower than the single information recognition method, the multi-information recognition method ensures the accuracy of recognition and has good robustness.

Informed Consent Statement: Not applicable.
Data Availability Statement: Data available on request due to restrictions e.g., privacy or ethical. The data presented in this study are available on request from the corresponding author. The data are not publicly available due to our dataset is about courier package, there is a lot of personal information on it.

Conflicts of Interest:
The authors declare no conflict of interest.